##Chapter 10

1a. Conditional average treatment effect refers to the average treatment effect for a portion of the sample who shares the same value for a certain variable.

1b. Average treatment on the treated refers to the average treatment effect on only the portion of your study sample that actually received treatment.

1c. Average treatment on the untreated refers to the average treatment effect on only the portion of your study sample that did not receive treatment.

  1. Enrollment in an exercise program with the intended outcome of weight loss. The treatment effect will differ based on diet, previous fitness level, previous weight/BMI, and age. Differences in the sample across these variables will elicit high heterogeniety in the effects of the exercise program on weight loss.

3a. Overall average treatment effect

(7+3+7+8+7+4)/6
## [1] 6

3b. Average treatment effect for women:

(3+7+4)/3
## [1] 4.666667

3c. We will get a variance-weighted average treatment effect. In a variance weighted average treatment effect, we weight groups proportionally with the proportion that receive treatment. All nonbinary people receive treatment, but they have a low variance in treatment effect (in fact a zero variance, because we don’t know their counterfactual, because our sample consists of one nonbinary person, and that person was treated). Only half of women receive treatment, but they have a higher variance, so this term will be more highly influential in the overall variance-weighted average. Therefore, the variance-weighted average will more closely reflect the women’s average treatment effect (rather than the non-binary group’s average treatment effect).

3d. Since the population is people of all ages, but only the teenagers get treatment, then we will get an overall average treatment effect, but it will be heavily biased towards the average treatment effect for teenagers. This is because teenagers were the only group to receive treatment, so the overall average treatment effect would only reflect the variance/value in the teenager age group. In this sense, you could call the outcome a variance-weighted average treatment effect, where the estimate will be most highly influenced by the teenager group, because the presence of other age groups in the treatment condition is zero.

  1. If for some reason the untreated group is impossible to treat, then the treatment effect on the treated is more useful than the overall average treatment effect. For instance, in the case of the exercise program, an untreated group that is impossible to treat might be infants. In this case, it would not make sense to consider the treatment effect on the untreated (infants), because there is no conceivable world in which infants would enroll in the exercise program. Cases in which the untreated group is untreated because it is either a) impossible or b) insensible to treat them should prioritize the treatment effect on the treated over the overall average treatment effect.

  2. Intent-to-treat (C)

6a. A variance-weighted treatment effect weights the treatment effects of each group in the sample based on the frequency with which each group receives treatment. For instance, if treatment was received by 75% of women but only 25% of men in a study, then the average treatment effect for women would be weighted more highly in the estimation. Variance-weighted treatment effects attempt to use the variation in treatment assignment across different subpopulations and the average variation within those subpopulations to create a more representative weighted average.

6b. Distribution weighted treatment effect refers to when you select a sample such that the treated and untreated groups have similar values on a back-door-path variable, causing the sample to consist more heavily of individuals who share that back-door variable characteristic. Then, you end up with a sample that contains majority a certain subset of the population that shares one value of that back-door variable, and your treatment effects reflect those of a group of individuals with really common values of that back-door-path variable you’ve matched on, rather than of the general population at large.

6c. We would get variance weighted when the distribution of treatment is heavily skewed towards a certain subpopulation. We would get distribution weighted treatment effects when the sample is selected to consist primarily of individuals who share some common back-door characteristic.

  1. Conditional average treatment effects: the population of interest is all adults in the US, but the sampling frame is only students at the university. The treatment effects will reflect only the values of individuals with the shared characteristic of “university student.”

8a. Average treatment effect

8b. Conditional average treatment effect

8c. Distribution weighted average treatment effect

8d. Local average treatment effect

8e. Average treatment effect on the treated

Chapter 11

  1. One method that you could use is to isolate only the causes of the treatment that are unrelated to the outcome. If we can do this, then we effectively control for possible back doors without actually controlling for anything at all, and those collecting data on various campus characteristics.

  2. If we can isolate only the portions of treatment that are unrelated to the outcome, then we have excluded the possibility of using variation that comes from a confounding back door variable. Therefore, we can safely ignore back doors.

3a. A robustness test is a way of checking whether we can disprove an assumption, or a way of redoing our analysis in a way that doesn’t rely on the assumption, and seeing if the result changes.

3b. The purpose of a robustness test is to determine the validity of an assumption. If the robustness test fails, then the validity of a study that relies on that assumption is jeopardized.

3c. Placebo tests are means of testing whether your treatment actually has an effect. You pretend that a treatment is being assigned when it actually isn’t, and see if you estimate an effect. If you find an effect of the placebo treatment, then you must have a bad assumption somewhere, because you have found an effect where there should not have been one.

  1. Some variables that might affect both likelihood of attending tutoring and GPA include: student SES, student previous education level/quality, student courseload (how many courses are they taking), average course rigor (how difficult are the courses they’re taking), student work ethic. It is not possible to measure and control for all these variables–particularly, estimating a student’s work ethic would be hard to do objectively (though perhaps you could ask the student for an estimate of how many hours they study a week, but this is still subject to error due to self-reports). Also, developing a standard metric for course rigor across a variety of disciplines might be difficult.

  2. Under partial identification, you make the assumptions you’re mostly certain about (ie control for certain variables we are mostly certain will be confounders). Then, with the remaining assumptions, you allow for a range of possibilities (ie don’t control for them). Then, you determine an estimate over that range of possibilities, which gives us a range of possibilities for the estimate itself. For instance, Chapter 11 uses the example of sports cars and reckless driving, in which many variables (gender, age, income) are controlled for, but risk-taking propensity is not. Once the estimate is attained with controls for those variables (gender, age and income), we then take into account variables we didn’t control for (risk-taking propensity) by narrowing the range of our estimate. Since we know risk-taking behavior is likely to be positively correlated with both sports car ownership and reckless driving, we can assume that risk-taking being left out of the model would make the relationship appear overly positive. Therefore, we can conclude our estimate is likely an overestimate.

6a. Causal diagram below

6a

6b. OthersBadDriving and YourFutureDriving

6c. TrafficSchool

6d. There is another path through which they are connected. We must have not included all relevant variables in our causal diagram.

  1. The effect is no larger than 2 percentage points (D)